Finding a Good Collection of Patterns Covering a Set of Sequences iiContents
نویسندگان
چکیده
The papers in the series are intended for internal use and are distributed by the author. Copies may be ordered from the library of Department of Computer Science. Abstract. We consider a problem of learning of unions of pattern languages from positive examples. We consider three diierent classes of patterns-regular patterns, substring patterns and the so called PROSITE patterns. By regular patterns we understand patterns where each variable symbol can appear only once. By substring patterns we understand a subclass of regular patterns of the type xxy, where x and y are variables and is a string of constant symbols. The PROSITE patterns is a class of patterns used for classiication of bio-sequences in PROSITE database. We present an algorithm which, given a set of sequences, nds a `good' collection of patterns`covering' this set. The notion of a `good covering' is deened as the most probable collection of patterns likely to produce the examples in some simple and natural probabilistic model. We show that this criterion is equivalent to the so called Minimum Description Length (MDL) principle. We present a polynomial-time algorithm for approximating the optimal cover within a logarithmic factor and prove its performance guarantees. In the case of substring patterns the running time of the algorithm is almost linear.
منابع مشابه
High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences
Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...
متن کاملMultigranulation single valued neutrosophic covering-based rough sets and their applications to multi-criteria group decision making
In this paper, three types of (philosophical, optimistic and pessimistic) multigranulation single valued neutrosophic (SVN) covering-based rough set models are presented, and these three models are applied to the problem of multi-criteria group decision making (MCGDM).Firstly, a type of SVN covering-based rough set model is proposed.Based on this rough set model, three types of mult...
متن کاملA set-covering formulation for a drayage problem with single and double container loads
This paper addresses a drayage problem, which is motivated by the case study of a real carrier. Its trucks carry one or two containers from a port to importers and from exporters to the port. Since up to four customers can be served in each route, we propose a set-covering formulation for this problem where all possible routes are enumerated. This model can be efficiently solved to optimality b...
متن کاملCapacitated Single Allocation P-Hub Covering Problem in Multi-modal Network Using Tabu Search
The goals of hub location problems are finding the location of hub facilities and determining the allocation of non-hub nodes to these located hubs. In this work, we discuss the multi-modal single allocation capacitated p-hub covering problem over fully interconnected hub networks. Therefore, we provide a formulation to this end. The purpose of our model is to find the location of hubs and the ...
متن کاملIRF and ISRF Sequences and their Anti-Pedagogical Value
Initiation, Response, and Feedback(IRF) sequences are the most frequent interaction network in any classroom contexts. IRF sequences have been examined profusely in previous studies and were reported to be negatively correlated with participation opportunities (Kasper, 2006; Cazden, 2001; Ellis, 1994).In all these studies, all contingent factors of any classroom context which might influence in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995